1. Always look at your data

Toolik Station (LTER) meteorological data (Source: Source: Shaver, G. 2019. A multi-year DAILY file for the Toolik Field Station at Toolik Lake, AK starting 1988 to present. ver 4. Environmental Data Initiative.)

Notice that the date parsed (assumed class) as character. That limits the nice time series features we can use, so we’ll quickly convert it into a tsibble (a time series data frame) so that we can use functions in feasts and fable to explore & analyze it.

Read in data:

toolik <- read_csv(here("data","toolikweather.csv"))

Convert the data frame to a tsibble

Go ahead and try plotting the data as imported.

ggplot(data = toolik, aes(x = date, y = mean_airtemp)) +
  geom_line()

# Booo we get a warning (only one observation per series)

Notice that it doesn’t work - because R doesn’t understand the date is a date until we tell it.

Let’s go ahead and convert it to a tsibble using the as_tsibble() function. First, we’ll need to convert the date to a date class, then convert to a tsibble:

toolik_ts <- toolik %>% 
  mutate(date = lubridate::mdy(date)) %>% 
  as_tsibble(key = NULL, index = date)

Now let’s plot it:

ggplot(data = toolik_ts, aes(x = date, y = mean_airtemp)) +
  geom_line() +
  labs(x = "Date",
       y = "Mean daily air temperature (Celsius)\n at Toolik Station")

We need to ask some big picture questions at this point, like:

  • Does there appear to be an overall trend? No.
  • Does there appear to be seasonality? Yes.
  • Does there appear to be cyclicality? Unsure.
  • Any notable outliers or additional patterns? No noted.

2. Use index_by() to aggregate time series by increments

We will use index_by() instead of group_by() to do the trick. See ?index_by() to group by a time index, then summarize() to specify what to calulate & return for each interval.

toolik_month <- toolik_ts %>% 
  index_by(yr_mo = ~yearmonth(.)) %>% 
  summarize(monthly_mean_temp = mean(mean_airtemp, na.rm = TRUE))

Now let’s take a look:

ggplot(data = toolik_month, aes(x = yr_mo, y = monthly_mean_temp)) +
  geom_line() 

# Or break it up by month: 
toolik_month %>% 
  ggplot(aes(x = year(yr_mo), y = monthly_mean_temp)) +
  geom_line() +
  facet_wrap(~month(yr_mo, label = TRUE)) +
  labs(x = "Year",
       y = "Annual mean air temperature (Celsius)",
       title = "Toolik Station mean annual air temperature",
       subtitle = "1988 - 2018",
       caption = "Source: Shaver, G. 2019. A multi-year DAILY \nweather file for the Toolik Field Station at Toolik Lake, AK\n starting 1988 to present. ver 4. Environmental Data Initiative.")

Can you do other increments with index_by()? Absolutely! See ?index_by() for grouping options!

Let’s find the yearly average for 2000:

toolik_annual <- toolik_ts %>% 
  index_by(yearly = ~year(.)) %>% 
  summarize(annual_airtemp = mean(mean_airtemp, na.rm = TRUE))

ggplot(data = toolik_annual, aes(x = yearly, y = annual_airtemp)) +
  geom_line()

And how about a weekly average?

toolik_weekly <- toolik_ts %>% 
  index_by(weekly = ~yearweek(.)) %>% 
  summarize(weekly_airtemp = mean(mean_airtemp, na.rm = TRUE))

ggplot(data = toolik_weekly, aes(x = weekly, y = weekly_airtemp)) +
  geom_line()

## 3. Use filter_index() to filter by date-times!

We can use filter_index() specifically to help us filter data by time spans. See ?filter_index() for more information.

Example 1: Filter from June 2000 through October 2001

toolik_ts %>% 
  filter_index("2000-06" ~ "2001-10")
## # A tsibble: 518 x 5 [1D]
##    date       station              mean_airtemp daily_precip mean_windspeed
##    <date>     <chr>                       <dbl>        <dbl>          <dbl>
##  1 2000-06-01 Toolik Field Station          2.5          0              3.8
##  2 2000-06-02 Toolik Field Station         -3            6.6            2.3
##  3 2000-06-03 Toolik Field Station         -1.3          0.8            2.1
##  4 2000-06-04 Toolik Field Station          4.8          1.3            2  
##  5 2000-06-05 Toolik Field Station          8.5          0              3.2
##  6 2000-06-06 Toolik Field Station         12.8          0.8            4.1
##  7 2000-06-07 Toolik Field Station         12.9          0              2.8
##  8 2000-06-08 Toolik Field Station          9.5          0              3.2
##  9 2000-06-09 Toolik Field Station          9.6          1              2.7
## 10 2000-06-10 Toolik Field Station         12.3          0              3.1
## # … with 508 more rows

Example 2: Filter from April 10, 2006 to May 15, 2006

toolik_ts %>% 
  filter_index("2006-04-10" ~ "2006-05-15")
## # A tsibble: 36 x 5 [1D]
##    date       station              mean_airtemp daily_precip mean_windspeed
##    <date>     <chr>                       <dbl>        <dbl>          <dbl>
##  1 2006-04-10 Toolik Field Station        -18.5            0             NA
##  2 2006-04-11 Toolik Field Station        -13.5            0             NA
##  3 2006-04-12 Toolik Field Station        -22.5            0             NA
##  4 2006-04-13 Toolik Field Station        -24              0             NA
##  5 2006-04-14 Toolik Field Station        -24.6            0             NA
##  6 2006-04-15 Toolik Field Station        -22.2            0             NA
##  7 2006-04-16 Toolik Field Station        -19.6            0             NA
##  8 2006-04-17 Toolik Field Station        -25              0             NA
##  9 2006-04-18 Toolik Field Station        -21.9            0             NA
## 10 2006-04-19 Toolik Field Station        -26.2            0             NA
## # … with 26 more rows

Example 3: Filter from December 20, 2017 to the end of the dataset

toolik_ts %>% 
  filter_index("2017-12-20" ~ .)
## # A tsibble: 377 x 5 [1D]
##    date       station              mean_airtemp daily_precip mean_windspeed
##    <date>     <chr>                       <dbl>        <dbl>          <dbl>
##  1 2017-12-20 Toolik Field Station         -6.5          0              6.8
##  2 2017-12-21 Toolik Field Station         -3.4          0              7.7
##  3 2017-12-22 Toolik Field Station         -1.7          0              6.3
##  4 2017-12-23 Toolik Field Station         -1.4          0              6.6
##  5 2017-12-24 Toolik Field Station         -3.9          0              4.4
##  6 2017-12-25 Toolik Field Station         -7.4          0              2.7
##  7 2017-12-26 Toolik Field Station        -12.9          0.2            1.9
##  8 2017-12-27 Toolik Field Station        -12.5          0.8            1.3
##  9 2017-12-28 Toolik Field Station        -17.1          0.3            0.3
## 10 2017-12-29 Toolik Field Station        -22.6          0.1            0.9
## # … with 367 more rows

4. Explore changes in seasonality with seasonplots

Let’s look at seasonality over the years with a seasonplot, using the feasts::gg_season() function. Notice that we can still do wrangling on a tsibble like we would with a normal data frame:

toolik_ts %>% 
  filter(year(date) > 2014) %>% 
  gg_season(y = mean_airtemp)

Daily measurements seems a bit excessive to return in this visualization, right? Maybe it makes more sense to use the monthly averages in ``.

# Now a season plot: 
toolik_month %>% 
  gg_season(y = monthly_mean_temp) +
  theme_minimal() +
  labs(x = "Year",
       y = "Mean monthly air temperature (Celsius)",
       title = "Toolik Station air temperature")

5. Seasonal subseries plots

Sometimes it can be useful to explore how values within one season/month/etc. change over time (e.g. across years).

We can use gg_subseries() to explore how values change within a specified window over time.

Do you notice any trends that differ across the months?

toolik_month %>% 
  gg_subseries(monthly_mean_temp)

6. Moving averages in tsibbles

We’ll use the slider package to find moving (or rolling) averages for different window sizes.

The general structure will tend to be something like:

df %>% slide(variable, function, .before = , .after = )

Let’s make a test vector just so we can see how this works:

set.seed(2021)
test<- rnorm(100, mean = 40, sd = 10)

# Show the series based on values +2 and -2 from each observation
# Use ~.x to show the windows
slide(test, ~.x, .before = 2, .after = 2)
## [[1]]
## [1] 38.77540 45.52457 43.48650
## 
## [[2]]
## [1] 38.77540 45.52457 43.48650 43.59632
## 
## [[3]]
## [1] 38.77540 45.52457 43.48650 43.59632 48.98054
## 
## [[4]]
## [1] 45.52457 43.48650 43.59632 48.98054 20.77430
## 
## [[5]]
## [1] 43.48650 43.59632 48.98054 20.77430 42.61744
## 
## [[6]]
## [1] 43.59632 48.98054 20.77430 42.61744 49.15566
## 
## [[7]]
## [1] 48.98054 20.77430 42.61744 49.15566 40.13772
## 
## [[8]]
## [1] 20.77430 42.61744 49.15566 40.13772 57.29963
## 
## [[9]]
## [1] 42.61744 49.15566 40.13772 57.29963 29.17795
## 
## [[10]]
## [1] 49.15566 40.13772 57.29963 29.17795 37.27175
## 
## [[11]]
## [1] 40.13772 57.29963 29.17795 37.27175 41.81995
## 
## [[12]]
## [1] 57.29963 29.17795 37.27175 41.81995 55.08542
## 
## [[13]]
## [1] 29.17795 37.27175 41.81995 55.08542 56.04470
## 
## [[14]]
## [1] 37.27175 41.81995 55.08542 56.04470 21.58524
## 
## [[15]]
## [1] 41.81995 55.08542 56.04470 21.58524 56.23310
## 
## [[16]]
## [1] 55.08542 56.04470 21.58524 56.23310 41.31389
## 
## [[17]]
## [1] 56.04470 21.58524 56.23310 41.31389 54.81122
## 
## [[18]]
## [1] 21.58524 56.23310 41.31389 54.81122 55.13318
## 
## [[19]]
## [1] 56.23310 41.31389 54.81122 55.13318 30.57557
## 
## [[20]]
## [1] 41.31389 54.81122 55.13318 30.57557 38.14315
## 
## [[21]]
## [1] 54.81122 55.13318 30.57557 38.14315 28.98875
## 
## [[22]]
## [1] 55.13318 30.57557 38.14315 28.98875 52.08115
## 
## [[23]]
## [1] 30.57557 38.14315 28.98875 52.08115 23.75061
## 
## [[24]]
## [1] 38.14315 28.98875 52.08115 23.75061 41.05378
## 
## [[25]]
## [1] 28.98875 52.08115 23.75061 41.05378 25.44557
## 
## [[26]]
## [1] 52.08115 23.75061 41.05378 25.44557 36.45984
## 
## [[27]]
## [1] 23.75061 41.05378 25.44557 36.45984 39.06300
## 
## [[28]]
## [1] 41.05378 25.44557 36.45984 39.06300 51.00669
## 
## [[29]]
## [1] 25.44557 36.45984 39.06300 51.00669 20.36175
## 
## [[30]]
## [1] 36.45984 39.06300 51.00669 20.36175 25.52056
## 
## [[31]]
## [1] 39.06300 51.00669 20.36175 25.52056 50.19443
## 
## [[32]]
## [1] 51.00669 20.36175 25.52056 50.19443 25.78583
## 
## [[33]]
## [1] 20.36175 25.52056 50.19443 25.78583 33.95468
## 
## [[34]]
## [1] 25.52056 50.19443 25.78583 33.95468 24.16526
## 
## [[35]]
## [1] 50.19443 25.78583 33.95468 24.16526 27.14068
## 
## [[36]]
## [1] 25.78583 33.95468 24.16526 27.14068 25.45315
## 
## [[37]]
## [1] 33.95468 24.16526 27.14068 25.45315 39.12929
## 
## [[38]]
## [1] 24.16526 27.14068 25.45315 39.12929 45.04736
## 
## [[39]]
## [1] 27.14068 25.45315 39.12929 45.04736 41.16389
## 
## [[40]]
## [1] 25.45315 39.12929 45.04736 41.16389 57.60214
## 
## [[41]]
## [1] 39.12929 45.04736 41.16389 57.60214 36.54884
## 
## [[42]]
## [1] 45.04736 41.16389 57.60214 36.54884 61.20000
## 
## [[43]]
## [1] 41.16389 57.60214 36.54884 61.20000 39.65623
## 
## [[44]]
## [1] 57.60214 36.54884 61.20000 39.65623 32.07846
## 
## [[45]]
## [1] 36.54884 61.20000 39.65623 32.07846 54.75515
## 
## [[46]]
## [1] 61.20000 39.65623 32.07846 54.75515 32.74443
## 
## [[47]]
## [1] 39.65623 32.07846 54.75515 32.74443 43.12379
## 
## [[48]]
## [1] 32.07846 54.75515 32.74443 43.12379 46.91964
## 
## [[49]]
## [1] 54.75515 32.74443 43.12379 46.91964 34.99709
## 
## [[50]]
## [1] 32.74443 43.12379 46.91964 34.99709 17.44131
## 
## [[51]]
## [1] 43.12379 46.91964 34.99709 17.44131 40.43741
## 
## [[52]]
## [1] 46.91964 34.99709 17.44131 40.43741 36.31182
## 
## [[53]]
## [1] 34.99709 17.44131 40.43741 36.31182 30.39778
## 
## [[54]]
## [1] 17.44131 40.43741 36.31182 30.39778 41.03766
## 
## [[55]]
## [1] 40.43741 36.31182 30.39778 41.03766 44.27289
## 
## [[56]]
## [1] 36.31182 30.39778 41.03766 44.27289 38.29518
## 
## [[57]]
## [1] 30.39778 41.03766 44.27289 38.29518 24.50860
## 
## [[58]]
## [1] 41.03766 44.27289 38.29518 24.50860 24.94400
## 
## [[59]]
## [1] 44.27289 38.29518 24.50860 24.94400 40.16044
## 
## [[60]]
## [1] 38.29518 24.50860 24.94400 40.16044 38.14636
## 
## [[61]]
## [1] 24.50860 24.94400 40.16044 38.14636 43.91933
## 
## [[62]]
## [1] 24.94400 40.16044 38.14636 43.91933 32.43289
## 
## [[63]]
## [1] 40.16044 38.14636 43.91933 32.43289 42.31418
## 
## [[64]]
## [1] 38.14636 43.91933 32.43289 42.31418 30.16387
## 
## [[65]]
## [1] 43.91933 32.43289 42.31418 30.16387 45.65081
## 
## [[66]]
## [1] 32.43289 42.31418 30.16387 45.65081 56.16752
## 
## [[67]]
## [1] 42.31418 30.16387 45.65081 56.16752 37.48036
## 
## [[68]]
## [1] 30.16387 45.65081 56.16752 37.48036 29.44121
## 
## [[69]]
## [1] 45.65081 56.16752 37.48036 29.44121 36.51768
## 
## [[70]]
## [1] 56.16752 37.48036 29.44121 36.51768 39.57010
## 
## [[71]]
## [1] 37.48036 29.44121 36.51768 39.57010 26.02446
## 
## [[72]]
## [1] 29.44121 36.51768 39.57010 26.02446 54.90216
## 
## [[73]]
## [1] 36.51768 39.57010 26.02446 54.90216 29.60613
## 
## [[74]]
## [1] 39.57010 26.02446 54.90216 29.60613 37.63055
## 
## [[75]]
## [1] 26.02446 54.90216 29.60613 37.63055 30.00859
## 
## [[76]]
## [1] 54.90216 29.60613 37.63055 30.00859 26.07457
## 
## [[77]]
## [1] 29.60613 37.63055 30.00859 26.07457 49.82005
## 
## [[78]]
## [1] 37.63055 30.00859 26.07457 49.82005 43.60941
## 
## [[79]]
## [1] 30.00859 26.07457 49.82005 43.60941 36.62491
## 
## [[80]]
## [1] 26.07457 49.82005 43.60941 36.62491 33.56612
## 
## [[81]]
## [1] 49.82005 43.60941 36.62491 33.56612 18.33115
## 
## [[82]]
## [1] 43.60941 36.62491 33.56612 18.33115 46.33289
## 
## [[83]]
## [1] 36.62491 33.56612 18.33115 46.33289 38.55086
## 
## [[84]]
## [1] 33.56612 18.33115 46.33289 38.55086 27.59973
## 
## [[85]]
## [1] 18.33115 46.33289 38.55086 27.59973 45.33959
## 
## [[86]]
## [1] 46.33289 38.55086 27.59973 45.33959 24.11735
## 
## [[87]]
## [1] 38.55086 27.59973 45.33959 24.11735 30.09035
## 
## [[88]]
## [1] 27.59973 45.33959 24.11735 30.09035 44.83261
## 
## [[89]]
## [1] 45.33959 24.11735 30.09035 44.83261 48.10618
## 
## [[90]]
## [1] 24.11735 30.09035 44.83261 48.10618 37.06335
## 
## [[91]]
## [1] 30.09035 44.83261 48.10618 37.06335 39.46542
## 
## [[92]]
## [1] 44.83261 48.10618 37.06335 39.46542 47.35184
## 
## [[93]]
## [1] 48.10618 37.06335 39.46542 47.35184 40.14985
## 
## [[94]]
## [1] 37.06335 39.46542 47.35184 40.14985 38.77998
## 
## [[95]]
## [1] 39.46542 47.35184 40.14985 38.77998 33.53226
## 
## [[96]]
## [1] 47.35184 40.14985 38.77998 33.53226 31.32142
## 
## [[97]]
## [1] 40.14985 38.77998 33.53226 31.32142 34.91300
## 
## [[98]]
## [1] 38.77998 33.53226 31.32142 34.91300 19.22416
## 
## [[99]]
## [1] 33.53226 31.32142 34.91300 19.22416
## 
## [[100]]
## [1] 31.32142 34.91300 19.22416
# Change that to a function name to actually calculate something for each window
# Note that I add `as.numeric` here, since the outcome is otherwise a list
w5 <- as.numeric(slide(test, mean, .before = 2, .after = 2))
w5
##   [1] 42.59549 42.84570 44.07266 40.47245 39.89102 41.02485 40.33313 41.99695
##   [9] 43.67768 42.60854 41.14140 44.13094 43.87995 42.36141 46.15368 46.05247
##  [17] 45.99763 45.81533 47.61339 43.99540 41.53038 40.98436 34.70785 36.80349
##  [25] 34.26397 35.75819 33.15456 38.60577 34.46737 34.48237 37.22929 34.57385
##  [33] 31.16345 31.92415 32.24818 27.29992 29.96861 32.18715 35.58687 41.67917
##  [41] 43.89830 48.31245 47.23422 45.41713 44.84773 44.08685 40.47161 41.92429
##  [49] 42.50802 35.04525 36.58385 35.22145 31.91708 33.12520 38.49151 38.06307
##  [57] 35.70242 34.61167 34.43622 33.21091 34.33574 35.92060 39.39464 37.39532
##  [65] 38.89621 41.34585 42.35535 39.78075 41.05152 39.83537 33.80676 37.29112
##  [73] 37.32411 37.54668 35.63438 35.64440 34.62798 37.42863 37.22751 37.93901
##  [81] 36.39033 35.69290 34.68119 32.87615 35.23084 36.38808 33.13958 34.39593
##  [89] 38.49722 36.84197 39.91158 43.36388 42.42733 40.56209 39.85587 38.22707
##  [97] 35.73930 31.55416 29.74771 28.48619
# Find the mean value of a window with n = 11, centered:
w11 <- as.numeric(slide(test, mean, .before = 5, .after = 5))
w11
##   [1] 40.18960 40.53644 41.61384 41.44983 43.03481 41.77509 41.63840 41.30162
##   [9] 42.35606 43.48773 40.99725 44.22078 44.10227 44.61642 45.97964 43.55018
##  [17] 44.36520 43.61220 44.54504 41.69642 40.33361 40.68454 38.88697 38.68235
##  [25] 38.33648 35.17544 34.71590 35.81147 35.52029 33.87243 33.91013 32.64530
##  [33] 32.64599 32.88866 33.43270 32.53790 35.92339 36.92596 37.92646 39.18741
##  [41] 39.01684 41.79774 42.30718 43.91360 44.62181 43.70815 41.55155 39.99112
##  [49] 39.96958 37.16937 37.29496 38.40354 36.90718 36.15847 34.50576 33.89129
##  [57] 34.17759 36.58468 35.85700 36.40266 36.38140 36.80078 37.88211 37.80803
##  [65] 38.25645 39.30860 39.25494 38.15295 39.15139 38.89441 38.46862 38.45451
##  [73] 36.67485 36.09781 36.65499 37.30806 37.03973 35.10892 36.95514 35.46866
##  [81] 35.28626 35.98708 35.45151 35.81658 35.36318 35.77198 35.81184 36.34814
##  [89] 38.98638 38.42429 38.44512 38.98444 37.71006 38.69148 37.70364 36.99075
##  [97] 35.75570 35.59224 35.03893 32.98678
# Find the mean value of a window with n = 19, centered:
w19 <- as.numeric(slide(test, mean, .before = 9, .after = 9))
w19
##   [1] 43.03481 41.77509 41.39982 41.43213 42.40737 43.31652 41.95832 42.79801
##   [9] 42.71556 43.35217 44.21311 43.42632 43.14509 42.37627 42.53946 42.69611
##  [17] 42.61381 41.36591 41.17234 40.21252 41.36140 40.47140 39.61354 39.35612
##  [25] 37.76354 38.41457 36.72679 35.98083 34.43566 33.59335 34.35503 34.51401
##  [33] 36.01998 35.20249 37.17351 37.09996 37.44906 38.41197 38.07941 37.66452
##  [41] 39.06231 39.56107 37.83722 38.60836 38.73242 39.06044 39.79186 40.78238
##  [49] 40.73848 39.65749 38.80381 37.88583 37.96991 37.06040 36.68022 37.21894
##  [57] 35.92467 36.60395 37.29046 36.79366 36.50124 37.50526 37.45961 36.91817
##  [65] 38.20788 37.60622 37.25662 36.82049 36.90291 38.21217 38.39370 38.31362
##  [73] 37.76871 37.02652 37.23803 37.67945 36.72939 36.15950 35.45619 35.49035
##  [81] 35.92798 36.37725 36.95824 36.14578 37.07977 37.21236 37.67401 38.06652
##  [89] 37.09291 36.63520 35.71937 35.83900 36.86887 36.27737 36.12581 36.73481
##  [97] 36.07291 37.06920 37.70364 36.99075
# Plot these together: 
combo <- data_frame(time = seq(1:100), test, w5, w11, w19)

ggplot(data = combo) +
  geom_line(aes(x = time, y = test)) +
  geom_line(aes(x = time, y = w5), color = "red") +
  geom_line(aes(x = time, y = w11), color = "blue") +
  geom_line(aes(x = time, y = w19), color = "orange")

Now for an example with our Toolik Station data, let’s say we want to find the average value at each observation, with a window that extends forward and backward n days from the observation:

roll_toolik_15 <- toolik_ts %>% 
  mutate(ma_15d = as.numeric(slide(toolik_ts$mean_airtemp, mean, .before = 7, .after = 7)))

roll_toolik_61 <- toolik_ts %>% 
  mutate(ma_61d = as.numeric(slide(toolik_ts$mean_airtemp, mean, .before = 30, .after = 30)))

ggplot() +
  geom_line(data = toolik_ts, aes(x = date, y = mean_airtemp), size = 0.2, color = "gray") +
  geom_line(data = roll_toolik_15, aes(x = date, y = ma_15d), color = "orange") +
  geom_line(data = roll_toolik_61, aes(x = date, y = ma_61d), color = "red") +
  theme_minimal()

7. Autocorrelation function

We’ll look at outcomes for both daily lags (yikes) and monthly lags (cool).

toolik_ts %>%
  ACF(mean_airtemp) %>%
  autoplot()

toolik_month %>% 
  ACF(monthly_mean_temp) %>% 
  autoplot()

##8. Decomposition

Here we will use STL decomposition (Seasonal, Trend, and Loess) decomposition. You can read about the advantages of STL decomposition here: https://otexts.com/fpp2/stl.html.

toolik_dec <- toolik_month %>% 
  model(STL(monthly_mean_temp ~ season(window = Inf)))

components(toolik_dec) %>% autoplot()